충실도 높은 기침 소리 생성을 위한 개선된 GAN

백문기; 이규철; Moon-Ki Back; Kyu-Chul Lee

연구문헌

국내 학회지

홈 > 연구문헌 > 국내 학회지 > 데이터베이스 연구회지(SIGDB)

데이터베이스 연구회지(SIGDB)

Current Result Document :

한글제목(Korean Title)	충실도 높은 기침 소리 생성을 위한 개선된 GAN
영문제목(English Title)	An improved GAN for generating high-fidelity synthesized cough sounds
저자(Author)	백문기 이규철 Moon-Ki Back Kyu-Chul Lee
원문수록처(Citation)	VOL 36 NO. 03 PP. 0036 ~ 0055 (2020. 12)
한글내용 (Korean Abstract)	GAN(Generative Adversarial Network)은 컴퓨터 비전 분야에서 큰 인기를 얻었고, 이미지 생성 작업에 널리 사용되고 있다. 그리고 최근, GAN 연구자들은 GAN을 사용하여 소리 데이터를 생성하기 시작했다. 파형 은 이미지와 다르게 이산 값으로 구성된 신호이므로, 이미지 학습에 주로 사용하는 CNN(Convolutional Neural Network)을 활용하여 파형을 학습하기 어렵다. 이를 극복하기 위하여, GAN 연구자들은 기존 이미지 생성 GAN을 재사용하여 파형 대신 시간-주파수 표현을 학습하는 접근을 제안했다. 이러한 접근을 따라서, 본 논문은 생성된 파형의 충실도(fidelity)를 개선하기 위한 개선된 소리 생성 GAN을 제안한다. 개선된 GAN 은 HPSS(Harmonic Percussive Source Separation)를 사용해 시간에 따른 스펙트럼의 특징을 추출하고, 점진 적으로 성장하는 네트워크를 통해 생성되는 파형의 품질을 개선하는 특징이 있다. 본 논문에서는 공개된 기침 데이터세트를 사용해 제안한 GAN을 학습시키고, 충실도와 다양성(diversity) 측면에서 성능을 평가한다.
영문내용 (English Abstract)	Generative Adversarial Networks (GANs) have gained tremendous popularity in computer vision, and have been widely used for image generation tasks. Recently, GAN researchers have started gen- erating sound data by using GANs. Unlike images, a waveform is a sampled signal consisting of discrete samples, so it is not easy to learn the waveform by utilizing Convolutional Neural Network (CNN), which is mainly trained on natural images. To overcome this difficulty, GAN researchers proposed an approach employing time-frequency representations instead of time-series waveforms to reuse existing image-generating GANs. Following this approach, we propose an improved sound-generating GAN to improve the fidelity of generated waveforms. We designed a network that first uses Harmonic Percussive Source Separation (HPSS) to extract spectral features over time and then improves the quality of generated waveforms by applying progressively-growing networks. In this paper, we train our GAN on a public cough dataset and evaluate the perform- ances in terms of the fidelity and diversity of generated waveforms
키워드(Keyword)	생성적 적대 신경망 생성 모델 소리 데이터 기침 소리 높은 충실도 Generative Adversarial Network generative model sound data cough sound high-fidelity
파일첨부	PDF 다운로드